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DOMESTIC-ANIMAL GENOMICS: 
DECIPHERING THE GENETICS OF 
COMPLEX TRAITS 



Leif Andersson** and Michel Georges* 

One of the 'grand challenges' in modern biology is to understand the genetic basis of 
phenotypic diversity within and among species. Thousands of years of selective breeding of 
domestic animals has created a diversity of phenotypes among breeds that is only matched 
by that observed among species in nature. Domestic animals therefore constitute a unique 
resource for understanding the genetic basis of phenotypic variation. When the genome 
sequences of domestic animals become available the identification of the mutations that 
underlie the transformation from a wild to a domestic species will be a realistic and 
important target. 

other animals have had their phenotypes monitored 
as closely as the principal domestic species. Moreover, 
thousands of years of selective breeding of these 
species has led to marked phenotypic changes and 
genetic adaptation to various environmental condi- 
tions. So, populations of domestic animals have a 
rich collection of mutations that affect phenotypic 
traits. Some of these traits, such as coat colour, have a 
simple monogenic basis, but most, such as growth, 
fertility and behaviour, are complex multifactorial 
characters. 

The advantages of domestic animals will become 
increasingly important as we move into the post- 
genomic era. Despite the fact that we now know the 
complete or near-complete genome sequences of several 
organisms, our knowledge of the genes that underlie 
phenotypic differences within and among species is 
rudimentary. For example, human geneticists have had 
remarkable success in identifying the genes and muta- 
tions that underlie disorders with a simple monogenic 
inheritance, but the identification of genes that underlie 
disorders or traits with a complex genetic basis has 
proved difficult despite considerable efforts^. Similarly, 
our knowledge of the genetic basis of phenotypic 
variation among species is sketchy at best. Indeed, to 
"Understand evolutionary variation across species and 
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During his voyage around the world, Charles Darwin's 
observations of the wealth of phenotypic diversity in 
nature inspired him to develop the theory of evolution 
by means of natural selection. However, this voyage 
took place more than 20 years before he published his 
seminal book On the Origin of Species by Means of 
Natural Selection 1 : in the meantime Darwin collected 
data to support his theory. The selective breeding of 
farm animals provided a large amount of these data. 
The phenotypic changes that were seen in farm animals 
that were subjected to selection essentially provided a 
'proof-of- principle' for his theory. In fact, Darwin him- 
self carried out breeding experiments with doves and he 
subsequently published two volumes on The Variation 
in Animals and Plants under Domestication 1 . 

Despite this early emphasis on the evolution of 
phenotypic variation among domestic-animal popu- 
lations, these species were rapidly superseded as the 
models of choice after the rediscovery of Mendelian 
genetics. Practitioners of this new/largely laboratory- 
based, discipline favoured cheaper and easier- to -breed 
organisms with shorter generation times, such as the 
mouse and Drosophila melanogaster. Nonetheless, in 
terms of dissecting the genetic basis of phenotypic 
diversity, domestic animals have some notable advan- 
tages when compared with model organisms. No 
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Box 1 j Comparative genomics, phenotyplc variation and domestic animals 

Comparative genomics is an important approach for dissecting the genetic basis 
of phenotypic variation. For example, there is considerable interest in comparing 
human and chimpanzee genomes to identify the genes that underlie the 
phenotypic traits that make us human 5 *" 54 . However, comparing genomic 
sequences among species does not allow functionally important substitutions to 
be easily identified, unless a mutation occurs in the coding sequence of a gene with 
a well-known function. The analysis of phenotypic extremes within species might 
provide a complementary approach to among-species comparative genomics. 
Charles Darwin used domestic-animal species as model organisms and they 
might once again provide a model system for understanding phenotypic 
evolution. Different breeds of domestic animals are in many cases as 
phcnotypicaUy diverse as separate species (sec figure). However, the recent origin 

of these breeds from their wild ancestors (-10,000 years before present) makes it /'. " V 

possible to make specific crosses and use segregation analysis to map the genes 
that underlie phenotypic traits. Moreover, owing to their recent origin, the 
sequence divergence among breeds of domestic animals is negligible, which makes 
it easier to identify the causative mutations that underlie phenotypic differences. 
By contrast* humans and chimpanzees diverged from a common ancestor millions 
of years before present and the average sequence divergence between the two 
species is -1.2% (REF. 55). This corresponds to -40,000,000 nucleotide 
substitutions plus numerous insertion and/or deletion differences. So, functionally important substitutions between 
humans and chimpanzees constitute the tip of an iceberg of changes that are selectively neutral. 

Nonetheless among-species comparative genomics will be crucial for pinpointing functionally important, non-coding 
sequences. The comparison of the human and mouse genome sequences showed that -5% of the mammalian genome is 
evolutionary conserved 5 *. However, only - 1 .5% of these genomes encode proteins, which indicates that the remaining 
evolutioharily conserved fraction of these genomes (-3.5%) is no n -coding. It is clear that the comparison of genome 
sequences from numerous species markedly improves the chance of recognizing evolutionarily conserved sequences 57,5 *. 
A striking illustration of this concept is provided by the recent identification of the causative mutation underlying the 
insulin-like growth factor 2 (IGF2) quantitative trait locus (QTL) in pigs i BOX 3). The mutation occurs in the middle of an 
intron and was identified by genetic analysis. A bioinformatic analysis of sequences from eight mammalian species showed 
that the 100 base pairs (bp) flanking the mutation have -85% sequence identity among distantly related mammals, as high 
as for most coding sequences, and the mutation is part of a 16-bp segment that has 1 00% identity among all eight species 7 . 
Consequently, sequence variation that is found in such evolutionary conserved regions can be used as primary candidates 
for phenotypic differences. The photograph of the boy with a female gorilla was provided by L A. 






the mechanisms underlying it" has been named as one 
of the 'grand challenges' of future genomics research 5 . 

Comparative genomics will have a big role in 
addressing this challenge. Comparative analyses of the 
genomes of different domestic breeds might prove to be 
one of the most efficient ways of dissecting the genetic 
basis of phenotypic variation. The large phenotypic dif- 
ferences and the limited neutral genetic variation 
among breeds make them ideal candidates for study 
(BOX i). The identification of the mutations that underlie 
the variation of several interesting monogenic pheno- 
typic traits in domestic animals (TABLE I ), as well as some 
mutations that underlie complex traits*" 10 , has already 
illustrated the potential of domestic animals for uncov- 
ering the genes that underlie phenotypic diversity. It is 
also worthwhile noting that many quantitative trait loci 
(QTLs) that affect a broad range of phenolypes — 
including growth, body composition and fertility — have 
already been mapped with high confidence in the 
different livestock species and are awaiting further charac- 
terization (for example, see REF. 1 1). 

However, the lack of genomic resources in domes- 
tic animals, as compared with species such as human 
and mouse, has hampered progress in gene mapping 
and identification. This situation will change markedly 



in the near future with the completion of draft genome 
sequences for key domestic animals. The chicken and 
dog sequences are already underway and the cow will 
follow soon after. Animal geneticists will then have a near 
complete list of all coding sequences, their chromosomal 
location, numerous genetic markers and the possibility 
to generate gene arrays for highly informative expression 
analyses. So, it will soon be possible to exploit the full 
potential of farm-animal genomics. Here, with this 
potential in mind, we provide an overview of domestic- 
animal genomics and the potential boost it will offer to 
our future understanding of complex traits. First, we dis- 
cuss the challenge that is presented by the genetic analy- 
sis of multifactorial traits and the way that this challenge 
has been addressed in domestic animals. We then sum- 
marize the present status of the diverse array of domes- 
tic-animal genome projects that are underway. Finally, 
we outline how the completion of these genome projects 
will facilitate complex-trait analysis in the future. 

The challenge of murtrfactorial traits 

Most biological traits and all common diseases in 
humans have a multifactorial (or complex) inheritance, 
which indicates that they are influenced by numerous 
genes and environmental factors. A chromosomal 
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Table 1 | Monogenic traH loci for which the causative mutation has been Identified 



Species 


Trait 


Gene 


Reference 


Cattle 


Muscle hypertrophy 


MST 


72-75 




Goatcolour 


MC1R 


76 




White coat colour 


KfTLG 


23 




fish odour in milk 


FM03 


'* 77 


Chicken 


Albinism 


TYR 


78 




: Plumage colour 


MC1R 


79 




Dominant white plumage colour 


PMEL17 


80 


Dog 


Narcolepsy 


HCR7R2 


81 




Coat colour 


MCW 


82 


Goat 


• Lack of rx>ms, irrtersexuaSty 


Non-coding region* 


83 


Horse 


Coat colour 


MC1R 


84 






ASIP- 


85 






MATP 


86 




White colour, megacolon 


EDNRB 


87-89 


Pig 


Malignant hyperthermia 


RYR1 


21 




Dominant white colour, haematopoiesis 


KfT 


22,90.91 




Hyperchoiesterolaemia 


LDLR 


92 




■ * Goatcolour 


MC1R 


93.94 




Intestinal Escherichia cofi adherence 


FUTI 


95 




Glycogen content in skeletal muscle 


PRKAG3 


24 


Sheep 


Fertility, ovulation rate 


BMP15 


96 






BMPR1B 


97 




Muscle hypertrophy 


Regulatory mutation* 


98,99 



"This is apparently a regulatory mutation that affects the expression of one or more genes in the chromosomal region to which it maps. 



COMPOSITE INTERVAL MAPPING 
AND MULTIPLEOTL MAPPING 
Methods lhat increase 
quantitative trail locus (QTL) 
mapping resolution in a 
chromosome interval of interest 
by accounting for genetic 
background noise due to 
segregation at other QTLs by 
means of the inclusion of 
multiple mar km as cefaclors in 
the statistical model. 

EPISTASIS 

The phenotypic expression of 
genotypes at one locus depends 
on the genotype at another locus 
or other loci. 



region that contains one or more genes that influence a 
multifactorial trait is known as a QTL ,JJ \ The use of 
segregation analysis in informative families or experi- 
mental crosses to map QTLs is well established ,,M ' ,s . 
The power of such analyses to detect and map QTLs 
depends on how large a fraction of the phenotypic vari- 
ation is explained by a given locus and the size of the 
segregating population. 

The principal challenge with multifactorial traits lies 
not in detecting QTLs, but in unravelling the genes that 
underlie them. Despite large efforts to identify the genes 
that afreet multifactorial traits, in particular those that are 
involved in common human diseases, there are few suc- 
cess stories**. The identification of genes and mutations 
that underlie QTLs is problematic for several reasons. 
First, it remains difficult to determine the exact chromo- 
somal location of a QTL. As for monogenic Mendelian 
traits, the marker and crossover density in the region of 
interest limits the mapping precision. The fuzzy 
'detectance'of QTLs, that is, the probability of a QTL 
genotype given the phenotype, complicates matters even 
more. The lack of a direct relationship between genotype 
and phenotype, as exists for monogenic traits, prohibits 
the unambiguous identification of recombinant individ- 
uals that is required for high -resolution mapping. This is 
due to the fact that individual QTLs only account for part 
of the phenotypic variance, the rest being due to environ- 
mental factors as well as other QTLs. The situation is par- 
ticularly tricky in the case of numerous loosely linked 
QTLs, which seem to account for a significant proportion 
of the largest QTL effects that are detected in livestock 



(M.G., unpublished observations). Although composite 

INTERVAL MAPPING AND MULTIPLE QTL MAPPING might help tO 

unravel such situations in experimental crosses between 
inbred lines 1 *, the development of a suitable statistical 
framework to address such situations in outbred designs 
is only in its infancy 100 , epistatk; interactions might also 
add to the challenge of dissecting the genetic basis of 
complex traits (BOX 2). Certainly, data from model organ- 
isms and from coat-colour inheritance in mammals indi- 
cate that epistasis between QTLs might be an important 
factor for consideration. However, studies of epistasis 
among QTLs in vertebrates have been rare' 6 -". For these 
reasons, QTLs are often mapped to chromosomal regions 
that are over 20 centiMorgan (cM) long (-20 megabase 
pairs (Mb)) and that might contain several hundred 
genes. 

Second, most QTLs have a mild phenotypic effect, so 
the mutations that cause them are difficult to distin- 
guish from neutral polymorphisms. By contrast, muta- 
tions that cause monogenic disorders generally knock 
out gene expression or lead to an altered protein func- 
tion. Another factor that complicates the identification 
of QTL mutations is that it is likely that a good propor- 
tion of these are regulatory mutations. Our ability to 
spot and evaluate functionally important mutations in 
non-coding regions is still poorly developed. For exam- 
ple, the identification and annotation of regulatory 
regions in sequenced genomes is still very rudimentary 
compared with lhat of coding sequences, although our 
ability to identify important regulatory elements 
through sequence comparison will improve as more 
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EFFECTIVE POPULATION SIZE 
The number of individuals in a 
theoretically ideal population 
that are subject to the same 
amount of genetic drift as the 
actual population 

HETEROZYGOSITY 

The frequency of heterozygotcs 

at a locus. 

LINKAGE DISEQUILIBRIUM 
The non- random association of 
alleles at different tod. 

MALIGNANT HYPERTHERMIA 
AND HALOTHANE SENSITIVITY 
A disorder in which uncontrolled 
muscle contractions can cause 
lethal overheating. In pigs, this 
pathological condition might be 
induced by stress or exposure to 
halothane anesthesia. 
Susceptibility to malignant 
hyperthermia in pigs and in some 
human families is caused by 
mutations in the RYRI gene, 
which encodes ryanodine 
I. 



MUIJ.F.R IAN DUCTS 

The structures from which the 

vagina. cervix, uterus and 

oviducts derive in ihe female 

embryo. 

EPIGENETIC INH RRrTANCF. 
Inheritance of a molecular 
modification of DNA 
(methytation or chromatin 
structure) that affects gene 
expression. 



mammalian genomes are sequenced (BOX i). The situa- 
tion might be even more complicated if QTL effects 
reflect the combined action of clusters of tightly linked 
mutations", or if epigenetic inheritance contributes to 
quantitative genetic variation (BOX 2). 

Genetic dissection of complex traits 

So, it is clear that dissecting the genetic basis of complex 
traits presents a significant analytical challenge, and for 
this reason the systems in which we have more power to 
detect QTLs and the mutations that underlie them are 
of particular interest. Domestic animals are one such 
system. For several reasons, the power to detect the 
mutations in QTLs that underlie variation in complex 
traits is much better in domestic animals than in human 
families. First, there is less genetic heterogeneity within 
breeds owing to the limited effecttve population size com- 
pared with large outbred human populations. This 
means that the number of segregating QTLs and the 
number of alleles at each locus that affect a certain trait 
is expected to be lower within populations of domestic 
animals. Second, large family sizes in domestic animals 
(hundreds or thousands of progeny in some species) 
make it possible to deduce the QTL genotype of the par- 
ents with confidence by using progeny testing — that is, 
heterozygosity can be deduced on the basis of the pres- 
ence of QTL segregation, and homozygosity can be 
inferred by the lack of segregation if the family size is 
sufficiently large in relation to the expected size of the 
QTL effect'* 30 . Nonetheless, there are still relatively few 
examples for which the mutations that underlie 
mapped QTLs have been identified in domestic ani- 
mals. These few examples have been identified either 
because a gene that causes a monogenic trait has 



Box 2 | Eplatasls, epigenetic Inheritance and quantitative trait loci 

There is good evidence that epistasis between quantitative trait loci (QTLs) has art 
important effect on multifactorial traits. In particular, the significance of epistatic 
interaction among the genes that control coat colour in mammals is well established", 
and data from experimental organisms highlight the importance of epistasis 40 . So, some 
QTLs will only have an effect on certain generic backgrounds. However, the significance 
of epistatic interactions has not yet been extensively studied in non-experimental 
organisms, partly owing to an expected lack of statistical power in most studies and 
partly because of the lack of computer software that can handle the demanding 
statistical analysis. However, recent studies indicate that epistatic interaction contributes 
significantly to quantitative variation , *~ u . 

Does epigenetic inheritance contribute to the complex genetic background for 
multifactorial traits? Although the general model is that epigenetic imprints are erased 
during the development of a new indrvidual 61 , there is evidence that epigenetic imprints 
in the form of DNA-methylation patterns or chromatin configuration can be 
transmitted from parent to offspring and could influence the phenotype of the 
offspring. One example is the expression of coat colour in mice that have inherited the 
Viable ytUow allele at the Agouti locus from their dams 62 . In this case the incomplete 
erasure of an epigenetic modification at a retrotransposon, inserted upstream of the 
Agouti gene, when the Viable jr!W allele was transmitted through the maternal 
gerrnline, affects the phenotype. Evidence for epigenetic inheritance is also well 
documented in yeast* 5 , Drosophila meianogaster* 4 * 5 and plants 6 *" 6 *. This leads to the 
unorthodox view that a portion of the inherited variation in multifactorial traits can not 
be explained by differences in the nucleotide sequence itself, but in the degree of 
methylation or in the chromatin configuration. 



pleiotropic effects on several complex traits, or by 
adopting a positional candidate approach combined 
with tiNKACE-DiSEQLnuBRiiiM analysis. 

Genes with pleiotropic effects on complex traits. The 
identification of genes that cause monogenic traits is 
straightforward. The direct relationship between geno- 
type and phenotype allows the gene responsible to be 
mapped with high resolution. However, a gene that has a 
large effect on a monogenic trait can have minor effects 
on other complex traits. In effect, such a gene is behaving 
as a QTL for the complex traits that it influences. This is 
well illustrated by some examples from domestic ani- 
mals. For instances missense mutation in the pig RYRl 
gene, which encodes a calcium channel that is expressed 
in skeletal muscle, causes malignant hyperthermia and 
halothane sensitivity in the homozygous condition 2 '. 
However, this mutation is also associated with high lean- 
muscle content (that is, more muscle and less fat). 
Therefore, strong selection for leaner pigs between 1960 
and 1990 markedly increased the frequency of this 
mutation during that period. As a result malignant 
hyperthermia in commercial pig lines was a major prac- 
tical problem in commercial pig production until a 
DNA test for this mutation became available. 

Other examples come from genes that influence the 
colour of domestic animals. The combined effect of a 
gene duplication and a splice mutation in the K/Tgene 
causes dominant white-coat-colour in pigs 12 . However, 
these mutations have pleiotropic effects on haemat- 
opoieses and the locus behaves as a QTL that influences 
the number of red and white blood cells. A parallel exam- 
ple is a missense mutation in the KIT ligand gene 
{K1TI.G) that causes roan/white colour in cattle and is 
associated with developmental anomalies of mullerian 
ducts (also known as white heifer disease) with complex, 
multifactorial inherilance , \ 

The positional candidate approach. In the positional 
candidate strategy linkage analysis is used to map the 
locus to a specific chromosomal region. This region is 
subsequently scrutinized for candidate genes that might 
influence the trait being studied. The next step is then to 
search for causative mutations in the candidate gene. 
This approach is the most common strategy for dissect- 
ing monogenic traits in mammals as there are complete 
genome sequences for several species and the functional 
characterization of genes is continuously improving. 
However, the generally poor resolution of initial QTL 
mapping means that this approach is more difficult to 
apply to multifactorial traits. Specifically, the region that a 
QTL is mapped to might contain too many plausible can- 
didate genes and even several poorly characterized genes 
that cannot be excluded as candidates. Occasionally, how- 
ever, positional candidate cloning can be a quick way of 
identifying a causative gene that can be confirmed by 
further genetic data or functional assays. 

An example of the successful use of this approach is 
the identification of allelic variants at the PRKAG.* locus 
that encodes a su bun it of AMP-activated protein kinase 
(AMPK), an enzyme that has a key role in the metabolic 
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ADVANCED INTERCROSS LINES 
The subsequent generations 
{ F,, F t and so on) of in 
intercross, which are maintained 
to allow the high-resolution 
mapping of quantitative trait 
loci. 

HAPLOTYPE 

A combination of alleles at 
different loci that is transmitted 
together from one generation to 
the next. 



regulation of cukaryolic cells. Pigs with the so-called 
RN phenotype have a high glycogen content in skele- 
tal muscle and a high lean-meat content, but the qual- 
ity of the meat is not as good for processing as that 
from norma) pigs. A pure positional cloning approach 
was used to identify the causative mutation as a mis- 
'sense substitution (R225Q) in PRKXG3(REF24).This 
mutation might be considered as a QTL allele, but the 
phenotypic difference between genotypes is so large 
that this trait is effectively monogenic. However, 
Ciobanu et al. 15 subsequently found a QTL peak for 
several meat-quality traits including glycogen content 
that mapped to the same region as PRKAG3. The 
R225Q mutation was not involved, but three other 
missense mutations in this gene were potential candi- 
dates. Further studies in several commercial pig lines 
have confirmed that at least of one of these mutations 
— at a residue adjacent to the R225Q mutation 
(V224I) — influences glycogen content and meat 
quality 25 *". Interestingly, it has recently been shown 
that these two residues are located in a part of AMPK 
that is directly involved in the binding of AMP: a key 
step in the allosteric activation of the enzyme". 



Illllllll;!!!!!!!! 



9=2 




9=n 



01 02 01 

Figure 1 1 Identical-by-descent mapping. Assume thai the quantitative trait locus (QTL) aDete 
Q2 original es by mutation from allele O 1 at generation 0. There wi be a complete Snkage 
olsequftbrium between Q2 and alleles al al other loci in the first gamete carrying 02. This linkage 
cSsequibriurn wi then gradually decay through each generation owing to recombination, but 
linkage disequffibrium will persist tor ctosety linked loci. At generation n a sample of chromosomes 
are collected and classified (0? or Q2) by segregation analysis. Genetic markers and sequence 
analysis are then used to define the minimum haptotype that is shared identical by descent 
among animals carrying 02 (indicated by the yeflow bar). 



Identical-by-desce?H mapping. In model organisms, fur- 
ther breeding experiments or advanced intercross lines 
can be used to refine the map position of QTLs ; *. 
However, with the exception of the chicken, the applica- 
tion of such an approach to domestic animals is expen- 
sive. A promising alternative approach is to combine 
linkage and linkage-disequilibrium analysis**"". The 
basic principle of this approach is outlined in FIGURE 1. 
Assume that a QTL allele Ql mutates to Q2 at genera- 
tion 0. In the first generation there is complete linkage 
disequilibrium between Q2 and the alleles at all other 
polymorphic loci on the same chromosome. In each 
subsequent generation, recombination gradually 
reduces the size of the block of linkage disequilibrium 
that surrounds the QTL. The key to narrowing down 
the location of the QTL is to use linkage analysis to 
deduce the QTL genotype, and then to use a dense set of 
genetic markers to determine the minimum hapj.otvpf. 
that is shared identical by descent (IBD) among the ani- 
mals that carry the Q2 allele. 

The recent and strong selection that domestic ani- 
mals have been subject to make the IBD-mapping 
approach useful in these species. Strong directional selec- 
tion in domestic animals has led to selective sweeps in 
which alleles at loci that underlie selected traits have 
increased markedly in their frequency. This process leads 
to a bss of heterozygosity in the flanking region owing to 
'hitch-hiking™ (PIG.2). The size of the region that shows 
a hitch-hiking effect will depend on how quickly the 
favourable haplotype becomes fixed (homozygous) and 
the recombination rate in the interval. After a selective 
sweep the occurrence of new mutations slowly restores 
the heterozygosity. However, the short evolutionary his- 
tory of animal domestication indicates that the genomic 
footprints of major selective sweeps should largely 
remain. 

The IBD approach was used in cattle to position a 
QTL for twinning rate to an interval of less than 1 cM 
It was also used to identify putative causative muta- 
tions in the diacyl glycerol acyltransferase ( DGAT) and 
growth hormone receptor (GHR) genes, which underlie 
two major QTLs for milk-production traits on cattle 
chromosome 14 and 20, respectively*-* 10 . Subsequent 
genetic and functional studies have now provided 
strong evidence that a missense mutation in DGAT 
(K232A) is the mutation that underlies a QTL for milk 
yield and composition'. The insulin-like growth factor 2 
(iGF2) QTL in pigs is another prime example of how 
this approach has allowed extraordinary resolution in 
QTL mapping, down to a single nucleotide substitution 
in a non -coding region 7 - 20 (BOX 3). 

The IGF2 story illustrates some of the advantages of 
using domestic animals for QTL studies. The genetic 
evidence pinpointing the mutation that underlies the 
/GF2QTL was obtained because the ancestral haplo- 
type, which only differed at the quantitative trait 
nucleotide (QTN), was available. There is a better 
chance of the ancestral haplotype being available in 
domestic species than for most other species. This is 
pardy because selective sweeps in domestic animals have 
often occurred within a fairly short period of time, 
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WHOL£-CENOME SHOT-G UN 
The random generation of jhort 
UNA -sequence reads from the 
whole genome. 

FINCERFRENT MAP 
A map of a clone ot a genome 
thai is based on the pattern of 
fragment! that are generated by 
restriction enzyme digestions. 



which increases the chance that the ancestral haplotype is 
still present in some populations. Moreover, in principle, 
animal geneticists have access to the world population of 
domestic animals, so the entire diversity of haplotypes 
that is present in these species can be characterized. By 
contrast, laboratory strains of most experimental 
organisms represent only a tiny fraction of the genetic 
diversity of these species 107 . 

The mapping of the ICF2 QTN also illustrates the 
advantage that the QTL-mapping approach to finding 
genes that control multifactorial traits might have com- 
pared with mutagenesis-screening programmes (which 
are often considered to be a more productive way of 
approaching this task 3 *). The IGF2 QTN only has a lim- 
ited effect on muscle mass in the pig (a 3-4% increase), ' 
so it would be extremely difficult to identify the 
causal mutation in this case with any other genetic 
screening method available at present. In particular, 
this limited effect would prevent its identification in a 
high -throughput mutagenesis-screening programme. 

Domestic-animal genomics 

The basic tools for genome research have been estab- 
lished for all principal domestic species: hundreds of 
microsatellite markers; low-resolution linkage and 
physical maps; and large-insert genomic libraries (see 
the online links box). However, despite the notable 
successes that are discussed above, positional cloning 
of trait loci in domestic animals has been hampered 
by the absence of high- resolution linkage maps (several 
markers per cM), comprehensive collections of exp- 
ressed sequence tags (ESTs), whole-genome bacterial 
artifical chromosome (BAC) contigs and whole- 
genome sequences. Positional cloning of monogenic 
trait loci and QTLs have therefore relied on compara- 
tive mapping using primarily the human and mouse 
maps, and more recently genome sequences' J . This is a 
laborious approach as it is often necessary to design 
numerous primer pairs for PCR that are based on 
conserved sequences from other species, and to gener- 
ate local BAC contigs for each region of interest. 
Furthermore, there is always the risk of a minor chro- 
mosomal rearrangement between the target and refer- 
ence species that might slow down progress. This 
approach is particularly cumbersome in the chicken 
owing to the large evolutionary distance between birds 
and mammals (-300,000,000 years). 

However, the development of extra genome resources 
and the imminent completion of draft genome sequences 
for several of the principal domestic animals should 
soon remove the bottlenecks that are hampering the 
positional cloning of QTLs in domestic animals (see 
table 2 and below). Efforts are now underway to gener- 
ate high-quality draft (HQD) genome sequences for 
chicken, dog and cattle. These will not be finished 
genome sequences, so there will be sequence gaps and 
there will be errors in the assembly. However, it will be 
relatively straightforward for researchers to generate fin- 
ished sequences for their regions of interest using the 
HQD sequence combined with the finished sequence 
from other vertebrates. 



Chicken. Chicken will be the first domestic animal to 
have its genome sequenced to near completion. At pre- 
sent, the Washington University School of Medicine 
Genome Sequencing Center is completing a HQD 
genome sequence that is based on 6.6x whole-<;enomf 
shot-gun (WGS) sequence and end -sequencing of large- 
insert clones combined with the generation of a BAC 
flngerprint map (for chicken genome projects, see online 
links box). The sequence is based on genomic DNA 
from a single red junglefowl female. Sequence reads 
providing ~6x coverage have already been deposited in 
GenBank's trace archive (see online links box) and a 
draft genome assembly will be released during February 
2004 (TABLE 2). Trje assembly of the genome sequence 
will be facilitated by the small genome size (-1.1 giga- 
base pairs (Gb)) and the low frequency of repetitive 
sequences compared with mammalian genomes. The 
sequence that is generated using the red junglefowl 
(which is the wild ancestor of the domestic chicken) will 
be complemented with a sequencing effort by the 
Beijing Genomics Institute (see online links box) that is 
expected to generate ~1 x coverage using genomic DNA 
from three breeds of domestic chickens (table 2). 

Dog. A partial genome sequence ( 1 .5x coverage) that is 
based on the sequence of a poodle has already been 
published". The bioinformatic analysis of these data 
identifed fragments representing putative orthologues 
of -75% of all annotated human genes, and more than 
4% of the non-coding sequence was found to be con- 
served between dog, human and mouse. The average 
sequence identity for orthologous genes was higher 
between dog and human than between human and 
mouse, despite the evidence that the dog is the phyloge- 
netic outgroup of these three species. This result is 
explained by a higher substitution rate in the rodent lin- 
eage. This survey sequence is a valuable resource for dog 
genomics. As an example, almost 1,000,000 putative sin- 
gle nucleotide polymorphisms (SNPs) and 150,000 
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Figure 2 1 Loss of heterozygosity owing to a selective 
sweep of a favourable mutation. In an out bred population 
trie heterozygosity varies along a chromosome according to 
the tocaJ mutation rate, previous selection history and genetic 
drift. Strong directional selection in domestic animals (and in 
other species) is expected to cause selective sweeps in which 
a favourable alete replaces other aJetes. This leads to 
homozygosity at the selected locus and ateo at flanking loci 
owing to *hitch-Nkkig' >4 . This characteristic pattern means that 
dense genome scans can show regions ot the genome that 
have gone through selective sweeps. 



NATURE REVIEWS | GENETICS 



VOLUME i | MARCH 2004 | 307 



n 



o 



REVIEWS 



polymorphic microsatellites were identified as heterozy- 
gous positions in the single dog that is being sequenced. 

A HQD genome sequence (~6.5x coverage) of 
another breed of dog (a boxer) will be completed by 
Spring 2004 (TABLE 2). Moreover, a large-scale SNP discov- 
ery project to identify SNPs that are common to many 
breeds is also underway. The HQD will be an important 
leap forward for dog genomics and will facilitate the 



identification of the causative mutations for some of the 
plethora of monogenic disorders that are found in 
dogs 40 . The recent identification of a mutation that 
causes renal cystadenocarcinoma and nodular der- 
matofibrosis in the German Shepherd dog illustrates the 
potential of this approach 41 . Furthermore, the rich 
and interesting phenotypic diversity in the morphology 
and behaviour in dogs (for example, see COX 1) will be 
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Box 3 | The IQF2 quantitative trait locus In pigs 



Insulin-like growth factor 2 {IGF2) was first identified as a paternally expressed quantitative trait locus (QTL) in intercrosses 
between the European wild boar and Large White domestic pigs; and between Large White and Pietrain pigs 69,70 . In the wild- 
boar intercross, the QTL allele from the domestic pig was associated with high muscularity, less backfat and a larger heart 
Sequence analysis showed that the IGF2 haplotypes in the Swedish Large White and Pietrain pigs were identical by descent, 
whereas the Belgian Large White and wild -boar haplotypes were similar, which indicated the presence of two alleles that are 
denoted Q and q for high and low muscle growth, respectively 7 . Another intriguing observation was the large sequence 
divergence (~ 1 %) between the two haplotypes. This led to the suspicion that the two haplotypes might have an Asian and 
European origin, in line with previous finding that some European breeds, including the Large White, are hybrids of Asian 
and European domestic pigs 4 *' 71 . Sequence analysis of IGF2 haplotypes segregating in an intercross between Chinese 
Meishan and Large White pigs confirmed this hypothesis. The Meishan allele, which was functionally equivalent to IGF2'q y 
was nearly identical to IGF2*Qa\ the sequence level. These data provided conclusive evidence that the causative mutation 
for IGF2*Q was a G-to-A substitution at nucleotide 3,072 in intron 3. 1GF2 was identified as a positional candidate gene, but 
the quantitative trait nucleotide (QTN) ,J was identified by pure genetics: linkage analysis to deduce QTL genotypes 
combined with an analysis of the minimum shared haplotype. 

Functional studies showed a plausible mechanism for the QTL effect (see figure). First, the mutation does not affect the 
imprinting or methylation status of the QTN region and the region is undermethylated in skeletal muscle. Second, the 
wild-type sequence binds a nuclear factor and this interaction is abrogated by the mutation and by methylation. Third, 
transfection analysis indicated that the wild-type sequence functions as a silencer dement, whereas the mutant sequence is 
a significantly weaker silencer Finally, expression analysis showed an approximately threefold upregulation of 1GF2 
expression in postnatal skeletal and cardiac muscle but not in prenatal muscle or in liver. The result is consistent with 
phenotypic data showing that JGF2'Qaie associated with high muscle growth and a larger heart, but has no effect on birth 
weight or the size of the liver. The IGF2 QTL is truly adaptive from a pig production point of view as it does not affect birth 
weight but supports muscle growth after birth. The photographs of the wild boar, Meishan, Pietrain and Large White pigs 
were provided by B. Kristiansson, Quality Genetics AB, J.-M. Beduin and the Roslin Institute, respectively. 
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Table 2 1 Present status of efforts to determine the genome sequences of domestic animals 



Species 


Genome size ' 


Number of ESTs* 


Institute 


Coverage 


Time schedule 


Chicken 


-1.1Gb 


451,655 


WashU 


6.6x WGS 


February 2004* 








Be$ng 


IxWGS 


Spring 2004* 


Dog 


-2.8 Gb 


27,010 


TIGR 


1.5* WGS 


Published (see REF. 39) 








Broad 


. -6.5xWGS 


Spring 2004 1 


Cattle 


-3 Gb 


331.140 


Baylor 


7x WGS 


2005* 


H9 : 


-2T7 Gb 


240,001 


Beipng 


-IxWGS 


2004' 


Horse 


-3 Gb 


15,240 








Cat 


-3 Gb 


228 








Sheep 


-3 Gb 


6,748 









'GenBank, 9 January 2004. 'Wesley Warren, personal communication. *Bh Uu. persona) coovnunication. »Kerstin Undbtad-Toh. personal 
comrnunjcation. KSeorge Weinslock. personal corrmjrucation. 'Merete Frecfhofm and Bin Uu. personal comnxjrication. 
Baytor. Human Genome Seqiencing Center at Baytor CoBege of Human Medcine; Beipng, Beijing Genomics Institute; Broad. Broad Institute; 
EST. expressed sequence tag; Gb. gpgabase pairs: TIGR. The Institute for Genomic Research. Rockvtte; WashU. The Genome Sequencing 
Center at Washington University School of Mectone; WGS. wtxjte-oenome shot-gun (See associated web sites in the onfine Snks box) 
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able to be studied in much more detail when the HQD 
sequence becomes available. However, the identifica- 
tion of the mutations that underlie interbreed diversity 
will be challenging as few attempts have been made to 
generate or collect informative intercross pedigrees that 
can be used for genetic studies. 

Cattle. An effort to generate a HQD genome sequence 
from cattle has been initiated and the plan is to carry out 
BAC skim sequencing (~lx coverage), 5x WGS reads 
from a single, partially inbred Hereford animal, and 
finally 1 x WGS reads from animals representing differ- 
ent breeds to allow extensive SNP detection (TABLE 2). 
In addition, a BAC fingerprint map is now being gen- 
erated for cattle (for cattle genome projects, see online 
links box). These will be important resources for the 
ongoing efforts to identify genes that influence milk- 
and meat-production traits, as well as the genes that 
influence susceptibility to infectious diseases in this 
species. The cattle genome sequence will also be a valu- 
able resource for sheep and goat genomics owing to 
the close evolutionary relationship between these 
species (-20,000,000 years). 

Pig. A Chinese and Danish collaboration has generated 
about 800,000 ESTs from several pig tissues and a partial 
genome sequence (~Ix coverage). This sequence 
information will be released during 2004 (table 2). 
Unfortunately, no funding has yet been secured for a 
HQD genome sequence and the National Human 
Genome Research Institute (NHGRI) has given 
medium priority to the sequencing of the pig genome 
(see online links box). Pig genomics will benefit to some 
extent from access to a HQD genome sequence for cat- 
tle, but these species diverged early during the evolution 
of the Cetaniodactyla lineage, about 60,000,000 years 
before present 42 . 

SNP detection. The WGS efforts for chicken, dog and 
cattle will primarily be based on a single individual as 
the assembly of the genome sequence is facilitated by 
a reduced genetic heterogeneity. The drawback with 
this approach is that the sequence data will be less 



informative for finding polymorphisms. In the 
chicken, this will be compensated by the generation of 
~ I x genome coverage from domestic chickens, which 
is expected to reveal millions of chicken SNPs. Similar 
efforts are underway in dogs and in cattle (see above; 
see also the USDA Meat Animal Research Center 
(MARC) genomic resources in online links box). 
There are also initiatives to delect numerous SNPs 
from available EST resources in chicken (sec BBSRC 
ChickEST database in the online links box), cattle 4 ' 
(for cattle genome resources, see the online links box) 
and in pigs** (M. Fredholm, personal communication). 

The future of complex-trait analysis 

The generation of HQD genome sequences will allow the 
full potential of using domestic animals for deciphering 
the genetic basis of multifactorial traits to be realised. The 
access to the genome sequence will speed up QTL detec- 
tion in several ways (FIG. 3). Low-resolution QTL mapping 
in commercial and experimental populations is already 
well established. High -resolution IBD mapping, which is 
vital for positional cloning of QTLs, will soon be facili- 
tated by the access to numerous microsatellite and SNP 
markers, and possibly haplotype maps as have been 
developed for inbred strains of mice 45 and for humans**. 
The identification of candidate genes and mutations will 
be facilitated by the access to species-specific genome 
sequences, and the re-sequencing of QTL intervals wQl be 
an attractive approach if the achieved map resolution is 
sufficiently good. 

The access to dense marker maps should open up the 
possibility for a new approach for QTL detection. If 
the marker density is sufficiently high then marker 
screenings using a limited number of animals (or pools of 
animals) representing different populations of domestic 
animals should detect the footprints of selective sweeps, 
such as the one observed for ICF2 in pigs (Fic. 2). Several 
factors influence the expected size of haplotype blocks 
that are fixed by selective sweeps. A high recombination 
rate will reduce the length of haplotype blocks. If the 
effective population size is small, large haplotype blocks 
might be fixed owing to genetic drift. For example, inbred 
lines of laboratory mice have large haplotype blocks and 
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Rgure 3 1 Approaches to mapping and positional cloning of QTLs in domestic animals. 

The segregation of quantitative trait loci (QTLs) can be detected in family material from 
commercial populations or from experimental crosses. QTL mapping in intercrosses between 
divergent populations has an excellent power for QTL detection, owing to the high heterozygosity 
at QTLs in the F , generation. However, the resolution is rather poor as it is based on those 
recombination events that occur in the experimental pedigree. So, an initial tow-resolution 
mapping in an intercross can be followed up by high-resolution linkage-disequilibrium mapping 
within a commercial population, if the QTL is segregating within such populations, or by the 
detection of the minimum shared haptotype representing a selective sweep. The identification of 
the causative mutation for the insulin-fike growth factor 2 {IGF2) QTL in pigs is an excellent 
illustration of how this combined approach has been used ( BOX 3). 



most of these have been fixed by drift 45 . In certain dog 
breeds it can be difficult to distinguish haplotype 
blocks that have been fixed by selection from those that 
have been fixed by drift, as many breeds have been 
established with a limited number of founder animals 
or have gone through severe population bottlenecks. 
The size of haplotype blocks is strongly influenced by 
the selection intensity as it determines how quickly a 
haplotype carrying a favourable mutation reaches fixa- 
tion. So, modern breeding programmes with intense 
selection are expected to cause fixation of larger haplo- 
type blocks compared with the less intensive animal 
breeding that was carried out before the twentieth cen- 
tury. However, even QTL alleles with fairly large effects 
are not expected to be fixed in a few generations as they 
only explain, by definition, a fraction of the genetic 
variance for a given trait. 

There is an inherent conflict concerning the optimal 
size of haplotype blocks to be used for the detection of 
QTLs. The larger a fixed haplotype block is the easier it 
is to detect, but it will then be more difficult to identify 
the causative gene and mutation. The general trend is 
that population-wide linkage disequilibrium in live- 
stock extends over tens of cM rather than < IcM regions, 
as typically observed in humans 47 . However, it is often 
the case that there are several breeds of domestic ani- 
mals that are selected for the same purpose and there is 
often some gene flow between similar breeds. So, the 
same IBD segment might be favoured by selection in 
different breeds. This is very well illustrated by the IGF2 
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QTL in pigs 7 . The same IBD segment was identified in 
four different breeds of pig. The region of strong linkage 
disequilibrium within breeds extended over several- 
hundred kb, but the minimum IBD haplotype that was 
identified across breeds was only 20 kb. Furthermore, 
many domestic breeds have a mosaic structure of varia- 
tion owing to recent or historical admixture of divergent 
breeds. The tNTROCRi-ssiON of Asian domestic pigs into 
European domestic breeds during the eighteenth and 
nineteenth century is one good example of recent 
admixtures 4 *. Similarly, some breeds of African cattle are 
the products of a progressive, male-driven admixture 
between the indigenous taurine breeds (Bos taurus) and 
immigrating zebu cattJe (Bos indicus)**. In both of these 
examples admixture has occurred between populations 
that originate from different subspecies of the wild 
ancestors of the livestock species. This situation can in 
fact greatly facilitate the detection of selective sweeps 
owing to the sequence divergence between haplotypes 
that originate from different subspecies {Box 3). 

So, the marker density that is required to detect hap- 
lotype blocks that are fixed by selective sweeps will vary 
from population to population on the basis of its his- 
tory, and from locus to locus depending on the local 
recombination rate and how quickly the favourable 
allele reached fixation. Suitable marker densities can 
be established empirically using test cases such as 
myostatin in cattle, IGF2 in pigs, as well as coat-colour 
loci: a marker density of at least 10 markers per cM will 
be required for QTLs, which corresponds to -30,000 
markers for a genome-wide scan. Ideally WGS reads 
would be generated using representatives of different 
breeds to 0.1-1 x coverage to uncover IBD regions 
within and across breeds, or even better to >2x cover- 
age to uncover most variants. The cost of SNP genotyp- 
ing or sequencing will, in the near future, limit the 
practical application of this IBD scanning approach. 
However, it should be an attractive approach for 
selected regions of the genome that contain important 
QTLs and the access to HQD genome sequences will 
make the approach feasible. 

The IBD scanning method of domestic breeds and 
their wild ancestors might also uncover chromosomal 
regions that contain important mutations that have 
been crucial during domestication. Such regions are 
expected to affect behaviour, reproduction, production 
and possibly coat colour. Furthermore, the comparison 
of animals representing breeds that are selected for dif- 
ferent purposes, such as dairy and beef cattle and egg- 
and meat-producing chickens, should uncover the loci 
that have responded to selective breeding as those show- 
ing low diversity within breeds but high diversity 
between divergently selected breeds. The proposed 
approach does not initially require any segregation 
analysis and the minimum IBD region surrounding a 
selected region is expected to be small (<I cM) unless 
the selective sweep was completed in a few generations. 
However, a problem with this approach is, of course, 
that it does not show which trait or traits the selected 
locus affects. Furthermore, false positives will occur 
because non-selected regions might become fixed 
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owing to random genetic drift. Therefore, it might be 
necessary to combine this approach with segregation 
analysis to confirm the presence of QTLs and to estab- 
lish genotype— phenotype relationships. 

The generation of draft genome sequences will also 
make it possible to construct comprehensive oligonu- 
cleotide arrays for expression analysis. This might turn 
out to be a useful complement to QTL mapping 
for identifying genes that underlie complex traits. For 
example, the threefold upregulation of IGF2 expression 
in pigs that carry the intron 3 G3073A mutation 7 
indicates that IGF2 could have been detected as a differ- 
entially expressed gene by an array analysis of skeletal- 
muscle mRNA from the wild boar and domestic pigs. 
However, a problem with this type of association analy- 
sis is that it is not possible to resolve whether an 
observed differential expression is a direct effect that is 
due to a ex acting regulatory element, or a secondary 
effect that is due to changes in gene expression at other 
loci. More recently this issue has been addressed by an 
approach that is known as genetical genomics, in which 
the expression of each individual gene is treated as a 
trait in QTL analyis* 0 * 51 . The approach is costly as it 
requires expression analysis of numerous individuals 
(>100), and there is also a problem in defining the 
appropriate statistical significance thresholds owing to 
ihe many different tests that need to be carried out. 
Nonetheless, the approach has the potential to uncover 
interesting cis- and fraru-acling effects of regulatory 



mutations. Finally, the emerging field of proteomics 
might provide further tools for the dissection of the 
molecular basis for phenotypic variation". 

Conclusions 

The generation of complete genome sequences for 
domestic animals is justified not only by the agricul- 
tural importance of these species, but also by the poten- 
tial contributions of these genome projects to basic 
biology and human medicine. Genome research in 
these species is generally regarded as applied science, the 
aim being to generate agricultural applications. These 
agricultural applications are certainly the greatest impe- 
tus for ongoing genome sequencing programmes in 
domestic animals, and molecular information will be 
increasingly important in practical breeding pro- 
grammes. However, we argue that domestic animals will 
also contribute to basic biology as they provide unique 
opportunities for unravelling the genetic basis of phe- 
notypic variation. Furthermore, human medicine will 
also benefit from the progress in domestic-animal 
genomics. Domestic animals provide models for disor- 
ders with both a monogenic and multifactorial back- 
ground (see TABLE i and the QTLs discussed above). So, 
post-genomic studies of domestic animals will lead to 
new knowledge of gene function and biochemical path- 
ways that are altered in disease conditions, and will 
expand the availability of large-animal models for the 
testing of new disease treatments. 
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